Goto

Collaborating Authors

 learning curve





A Proofs and Derivation A.1 Proof for Theorem

Neural Information Processing Systems

Let's follow the notations in Alg. 3 of Argmax Flow. We can unfold the determinant by the i-th row. This is illustrated in Figure A.1, where the adaptive Further details can be found in Tables A.2. Furthermore, we will make the code used to reproduce these results publicly available. In different environments, different state encoders were exploited. We used MLP encoder for discrete control tasks and CNN encoder for Pistonball task.







A Appendix 655 A.1 Learning Curves

Neural Information Processing Systems

Table 6: Hyper-parameters for SAC (on Atari) Total steps 1,000,000 Replay buffer size 100,000 0.99 Learning start 80,000 Actor train frequency 4 Critic train frequency 4 Target network update frequency 8,000 Actor Learning rate 3 10